%%% vim:ai:tw=72:
\chapter{Intel MPX Extension}

Intel MPX (Memory Protection Extensions) provides 4 128-bit wide bound
registers (\reg{bnd0} - \reg{bnd3}).  For purposes of parameter passing
and function return, the lower 64 bits of \reg{bndN} specify lower bound
of the corresponding parameter, and the upper 64 bits specify upper bound
of the parameter.  The upper bound is represented in one's complement
form.\footnote{MPX is not supported under ILP32 since MPX requires 64-bit
pointers.}

\section{Parameter Passing and Returning of Values}
\label{mpx-calling-conventions}

A \code{POINTER} class is added for passing and returning pointer types
and \code{INTEGER} class is updated as follows:

\begin{description}
\item[POINTER] This class consists of pointer types.
\item[INTEGER] This class consists of integral types (except pointer types)
  that fit into one of the general purpose registers.
\end{description}

\subsection{Classification}

Pointers and integers are classified as:

\begin{itemize}
\item Pointers are in the POINTER class.
\item Arguments of types (signed and unsigned) \code{_Bool}, \code{char},
  \code{short}, \code{int}, \code{long} and \code{long long}
  are in the INTEGER class.
\end{itemize}

The classification of aggregate (structures and arrays) and union
types is updated as follows:

\begin{enumerate}
\item When a C++ object is passed by invisible reference, the object is
      replaced in the parameter list by a pointer that has class POINTER.
\item If one of the classes is POINTER, the result is the POINTER.
\end{enumerate}

\subsection{Passing}

For parameter passing, if the class is INTEGER or POINTER, the next
available register of the sequence \RDI, \RSI, \RDX, \RCX, \reg{r8}
and \reg{r9} is used.

\begin{figure}
\Hrule
  \caption{Bound Register Usage}
  \label{fig-bnd-reg-usage}
  \begin{center}
    \begin{tabular}{l|p{7.35cm}|l}
      \noalign{\smallskip}
      \multicolumn{1}{c}{} &
      \multicolumn{1}{c}{}&
      \multicolumn{1}{l}{Preserved across}\\
      \multicolumn{1}{c}{Register} &
      \multicolumn{1}{c}{Usage}&
      \multicolumn{1}{l}{function calls}\\
      \hline
\reg{bnd0}--\reg{bnd3} & used to pass/return bounds of pointer arguments/return values & No\\
    \end{tabular}

  \end{center}
\Hrule
\end{figure}

\subsubsection{Bounds passing}
\label{bounds_passing}
Intel MPX provides ISA extensions that allow passing bounds for a pointer
argument that specify memory area that may be legally accessed by
dereferencing the pointer.  This paragraph desribes how the bounds are
passed to the callee.

Several functions used in the description below are defined as follows:
\begin{description}
\item[BOUND_MAP_STORE(bnd, addr, ptr)] This function executes Intel MPX \code{bndstx}
  instruction.  \code{ptr} argument is used to initialize index field of the memory
  operand of the \code{bndstx} instruction, \code{addr} is encoded in base and/or
  displacement fields of the memory operand, \code{bnd} is encoded in the register
  operand.
\item[BOUND_MAP_LOAD(addr, ptr)] This function executes Intel MPX \code{bndldx}
  instruction. \code{ptr} argument is used to initialize index field of the memory
  operand of the \code{bndldx} instruction, \code{addr} is encoded in base and/or
  displacement fields of the memory operand.
\end{description}

The following algorithm is used to decide how bounds are passed for each
eightbyte:
\begin{enumerate}
\item If the class is INTEGER, the eightbyte is passed in a general
  purpose register and the function being called uses varargs or stdarg,
  then the class is converted to POINTER.  Artificial bounds that allow
  accessing all memory are created for pointers contained in the eightbyte.
\item If the class is POINTER and the eightbyte is passed in a general
  purpose register, then the bounds associated with the pointers contained
  in the eightbyte are passed in the next available registers from the sequence
  \reg{bnd0}, \reg{bnd1}, \reg{bnd2} and \reg{bnd3}.  If there are no bound registers available
  for the bounds of an argument, then the bounds are passed in a CPU defined manner
  by executing \code{BOUND_MAP_STORE(bnd, addr, ptr)} function, where \code{bnd} is
  the current bounds of the argument, \code{addr} is the address of stack location beyond
  the location of the callee's return address (that will be put on stack by the corresponding
  call instruction), \code{ptr} is the actual value of the pointer argument.
  For each call, there may be up to two such pointer arguments,
  the first one has its bounds associated with (\emph{<return address stack location>--8})
  address, and the second one - with (\emph{<return address stack location>--16}).
\item If the class is POINTER and the eightbyte is passed on the stack, or
  the class is MEMORY and the argument contains pointer members,
  then the bounds associated with each pointer contained in the eightbyte
  are passed in a CPU defined manner by executing \code{BOUND_MAP_STORE(bnd, addr, ptr)} function,
  where \code{bnd} is the current bounds of the pointer argument, \code{addr} is the address
  of the pointer argument's stack location, \code{ptr} is the actual value of the pointer argument.
  If the eightbyte may contain parts of partially overlapping
  pointers, then bounds associated with the pointers are ignored and special
  bounds that allow accessing all memory are passed for such pointers.
\end{enumerate}
The callee uses the same algorithm to classify the incoming parameters.
If a parameter is passed to the callee using \code{BOUND_MAP_STORE}, then
the callee fetches the passed bounds using \code{BOUND_MAP_LOAD(addr, ptr)},
where \code{addr} is the same address passed to the corresponding \code{BOUND_MAP_STORE}
in the caller, and \code{ptr} is the actual value of the pointer parameter fetched
by the callee either from a general purpose register or from a stack location.

When passing arguments with bounds to functions, function prototypes must
be provided.  Otherwise, the run-time behavior is undefined.

\subsection{Returning of Values}

The returning of values is updated:

\begin{itemize}
\item If the class is INTEGER or POINTER, the next available register
   of the sequence \RAX, \RDX is used.
\end{itemize}

\subsubsection{Returning of Bounds}
The returning of bounds is done according to the following algorithm:
\begin{enumerate}
\item Classify the return type with the classification algorithm.

\item If the type has class MEMORY, on return \reg{bnd0} must contain
  bounds of the ``hidden'' first argument that has been passed in by
  the caller in \RDI.

\item If the class is POINTER, the next available register of the sequence
  \reg{bnd0}, \reg{bnd1}, \reg{bnd2}, \reg{bnd3} is used to return bounds
  of the pointers contained in the eightbyte.
\end{enumerate}

As an example of bound passing conventions, consider the declaration
and the function call is show in Figure \ref{fig_bounds_passing_example}.
The corresponding bound registers allocation is given in
Figure \ref{fig_bounds_allocation_example}, the stack frame offset given
shows the frame before calling the function.

\begin{figure}[H]
\Hrule
\caption{Bounds Passing Example}
\label{fig_bounds_passing_example}
\begin{center}
\code{
\begin{tabular}{|l|}
\cline{1-1}
extern void func (int *p1, int *p2, int *p3,\\
\phantom{extern void func (}int *p4, int *p5, int x,\\
\phantom{extern void func (}int *p6);\\
\\
func(p1, p2, p3, p4, p5, x, p6);\\
\cline{1-1}
\end{tabular}
}
\end{center}
\Hrule
\end{figure}

\begin{figure}[H]
\Hrule
\caption{Bounds Allocation Example}
\label{fig_bounds_allocation_example}
\begin{center}
\begin{tabular}{ll|ll|ll|ll}
\multicolumn{2}{l}{Bound Registers} &
\multicolumn{2}{l}{Stack Frame Offset for} &
\multicolumn{2}{l}{General Purpose} &
\multicolumn{2}{l}{Stack Frame} \\
 & &\code{BOUND_MAP_STORE} & &Registers & &Offset\\
\hline
\reg{bnd0}: &\code{p1} &\code{-16:} &\code{p5} \footnotemark &\RDI: &\code{p1} &\code{0:} &\code{p6} \\
\reg{bnd1}: &\code{p2} &\code{-24:} &\code{p6} &\RSI: &\code{p2} \\
\reg{bnd2}: &\code{p3} & & &\RDX: &\code{p3} \\
\reg{bnd3}: &\code{p4} & & &\RCX: &\code{p4} \\
& & & &\reg{r8}: &\code{p5} \\
& & & &\reg{r9}: &\code{x}  \\
\end{tabular}
\end{center}
\Hrule
\end{figure}

\footnotetext{Before the call to \code{func()} the return address of the call
is not yet put on the stack, thus offset \code{-16} accounts for the push
of the return address that will be made by the call instruction.}

\subsection{Variable Argument Lists}

Passing variable arguments is updated as follows:

\subsubsection{The Register Save Area}
\label{bounds-reg-save}
If a function taking a variable argument list is compiled for Intel MPX,
then the bounds passed for the argument registers saved to the
register save area are saved for each argument register in the prolog
by executing \code{BOUND_MAP_STORE(bnd, addr, ptr)} function (\ref{bounds_passing}),
where \code{bnd} is the current bounds of the pointer argument, \code{addr} is
the address of the argument register's location in register save area and
\code{ptr} is the actual value of the argument register.

\subsubsection{The \codeindex{va_arg} Macro}

The \code{va_arg(l, type)} implementation is updated as follows:

\begin{enumerate}
\item
  Fetch \code{type} from \code{l->reg_save_area} with an offset of
  \code{l->gp_offset} and/or \code{l->fp_offset}.  This may require
  copying to a temporary location in case the parameter is passed in
  different register classes or requires an alignment greater than 8 for
  general purpose registers and 16 for XMM registers.  If type specifies
  a pointer, then the bounds of the argument being fetched are loaded
  by executing\\
  \code{BOUND_MAP_LOAD(l->reg_save_area + l->gp_offset, ptr)}
  (\ref{bounds_passing}), where \code{ptr} is the actual value fetched from
  \code{l->reg_save_area} with an offset of \code{l->gp_offset}.
\end{enumerate}

\section{Program Loading and Dynamic Linking}

To preserve bound registers for symbol lookup in branch instructions
with the \texttt{BND} (\code{0xf2}) prefix, linker should generate the
BND procedure linkage table (see figure \ref{bnd_small_med_plt}) together
with an additional procedure linkage table (see figure
\ref{ext_small_med_plt}).

\begin{figure}[H]
\Hrule
\caption{BND Procedure Linkage Table (small and medium models)}
\label{bnd_small_med_plt}
\begin{footnotesize}
\begin{verbatim}
  .PLT0: pushq    GOT+8(%rip)              # GOT[1]
         bnd jmp  *GOT+16(%rip)            # GOT[2]
         nopl     (%rax)
  .PLT1: pushq    $index1                  # 16 bytes from .PLT0
         bnd jmp  PLT0
         nopl     0(%rax,%rax,1)
  .PLT2: pushq    $index2                  # 16 bytes from .PLT1
         bnd jmp  .PLT0
         nopl     0(%rax,%rax,1)
  .PLT3: ...
\end{verbatim}%$
\end{footnotesize}
\Hrule
\end{figure}

\begin{figure}[H]
\Hrule
\caption{Additional Procedure Linkage Table (small and medium models)}
\label{ext_small_med_plt}
\begin{footnotesize}
\begin{verbatim}
  .APLT1: bnd jmp  name1@GOTPCREL(%rip)
          nop
  .APLT2: bnd jmp  name2@GOTPCREL(%rip)    # 8 bytes from .APLT1
          nop
  .APLT3: ...                              # 8 bytes from .APLT2
\end{verbatim}%$
\end{footnotesize}
\Hrule
\end{figure}

To support indirect branches with the \texttt{BND} (\code{0xf2}) prefix
(see figure \ref{indirect_branch}), branches in all BND procedure linkage
table entries must have the \texttt{BND} (\code{0xf2}) prefix.

\begin{figure}[H]
\Hrule
\caption{Indirect branch}
\label{indirect_branch}
\begin{footnotesize}
\begin{verbatim}
  movq     fp@GOTPCREL(%rip), %rax
  bnd jmp  *(%rax)
  ...
  .globl   fp
  .section .data.rel,"aw",@progbits
  .align   8
  .type    fp, @object
  .size    fp, 8
fp:
  .quad    memcpy
\end{verbatim}%$
\end{footnotesize}
\Hrule
\end{figure}

When the BND procedure linkage table is used, the initial value of the
global offset table entry for the external function is the address of the
corresponding entry of the additional procedure linkage table.

%%% Local Variables:
%%% mode: latex
%%% TeX-master: "abi"
%%% End:
